library(tidyverse)
library(magrittr)
library(cowplot)

#Data Load In:
urine_raw <- read_csv('~/Documents/Projects/Short_Term_Broccoli/Data/SFN_Targeted/Urine/Urine_R_Format.csv')
metadata <- read_csv('~/Documents/Projects/Short_Term_Broccoli/Data/Metadata/Meta_urine.csv')
#Clean the metadata
mdata_clean <- metadata %>%
  #Weight (grams) used as a  proxy for mL, convert to L
  mutate(across(c(4:9), ~.x/1000)) %>%
  #Pivot the data longer
  pivot_longer(cols = starts_with('vol'), names_to = 'time', values_to = 'volume') %>%
  #Rename the variables so they can treated as factors
  mutate(time = gsub('vol_', '', time)) %>%
  mutate(time = gsub('h','', time)) 

#Clean the urine data
urine_clean <- urine_raw %>%
  #Rename the time points so they match the metadata
  mutate(time = gsub('h','', time)) %>%
  #Join the data with the metadata
  left_join(., mdata_clean) %>%
  group_by(subject_id, time) %>%
  #Multiply all the uM by the (proxy) volumes to convert to uMol recovered
  mutate(across(starts_with('SFN'), ~.x*volume)) %>%
  #Remove the alfalfa groups so we are only focusing on the broccoli
  filter(treatment %in% c('BL', 'BU')) %>%
  #Convert to factors for analysis + graphing
  modify_at(c('subject_id', 'time', 'cohort'), as.factor) 
#Relevel the time variables to make it work
urine_clean$time %<>% fct_relevel(c('0', '3', '6', '24', '48', '72'))

#Convert to tidy data for ease of analysis
urine_tidy <- urine_clean %>%
  pivot_longer(cols = starts_with('SFN'),  names_to = 'metabolite', values_to = 'uMol') 

This is a preliminary analysis of the targeted SFN-metabolite data from the short term broccoli feeding study. The study was run between March 2021 and November 2021. Sample prepartion was completed by Dr. Carmen Wong, Mass Spectrometry analysis was conducted Dr. Jaweoo Choi, and data analysis by Yanni Bouranis.

This analysis is preliminary and still needs further refinement, however, it seeks to answer the following questions:

  1. Do the label and the unlabeled groups differ for total SFN metabolite excretion?

  2. Does total SFN metabolite excretion differ between cohorts?

  3. Does the metabolism of SFN-NAC and SFN-NIT look similar to what we saw in Lauren Atwell’s data?

  4. Which participants were the highest and lowest SFN-NIT and SFN-NAC excreters?

  5. What impact does dose have on the recovery of metabolites?

All plots are interactive and can be zoomed in on and can be hovered over to get more data about specific data points. The legend can be used to isolate points of interest. Single click an item in the legend to hide all members of that class and double click to hide all others but keep that one.

Question 1: Do the label and the unlabeled groups differ for total SFN metabolite excretion?

The first question we want to answer is if the labeled and unlabeled groups differ in the total SFN metabolites (SFN_Tot) excreted at each time point. Let’s start by plotting each time point:

Plots

We will plot our data as both a box and whiskers plot and a dotplot so we can better see inter-individual variation. For the dotplot, the mean for each treatment group is represented as a black diamond. Note: the y-axis scales are not the same to allow us to see differences across metabolites more easily.

Boxplot
DvH_box <- ggplot(urine_clean, aes(x = treatment, y = SFN_Tot, fill = treatment)) + 
  geom_boxplot() +
  facet_wrap(~time, scales = 'free_y') +
  xlab('Treatment') + 
  ylab('Total SFN Metabolite Excretion (µMol)') +
  ggtitle('Total SFN Metabolite Excretion') +
  scale_x_discrete(labels = c('Labeled', 'Unlabeled')) +
  cowplot::theme_minimal_grid() +
  theme(legend.position = 'none')

plotly::ggplotly(DvH_box)
Dotplot
DvH_ind <- ggplot(urine_clean, aes(x = treatment, y = SFN_Tot, color = subject_id)) + 
  geom_point(aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Total Excretion: ', SFN_Tot, ' µMol \n'))) +
  geom_point(aes(group = time),stat = 'summary', fun = mean, fill = 'grey', color = 'black', alpha = 0.9, size = 3.6, shape = 18) +
  facet_wrap(~time, scales = 'free_y') +
  cowplot::theme_minimal_grid() +
  xlab('Treatment') + 
  ylab('Total SFN Metabolite Excretion (µMol)') +
  ggtitle('Total SFN Metabolite Excretion') +
  scale_x_discrete(labels = c('Labeled', 'Unlabeled')) +
  scale_color_discrete(name = 'Participant \n ID')

plotly::ggplotly(DvH_ind, tooltip = 'text')

From our plots, it looks like there is not much difference across our groups. Let’s verify that by running some stats.

Stats

For ease of analysis, instead of using a linear mixed model we will stratify our data by time point and use a simple Mann-Whitney U-Test (nonparametric t-test) to compare between the Label and the Unlabeled group. The interpretation of these results is that a significant p-value denotes that there is a significant difference between the labeled- and unlabeled-samples for total SFN metabolite excretion at each time point.

labelstats <- urine_tidy %>%
  filter(metabolite == 'SFN_Tot') %>%
  group_by(time) %>%
  nest() %>%
  mutate(pval = map_dbl(data, function(x) wilcox.test(uMol ~ treatment, data = x)$p.val)) %>%
  dplyr::select(-data) %>%
  rename('Time' = time, 'p-Value' = pval)

knitr::kable(labelstats)
Time p-Value
0 0.6547343
3 0.7985884
6 0.0850208
24 0.2836463
48 0.1458033
72 0.2797874

Unsurprisingly, there is no significant differences between our labeled and unlabeled groups at any time points.

Question 2: Does total SFN metabolite excretion differ between cohorts?

Our next goal is to evaluate if there is a significant cohort effect occurring when it comes to total SFN metabolite excretion. The cohorts were run during different times of years which could influence the growth of the sprouts (i.e. light, heat conditions), as well as different biological conditions occurring such as weight, activity level, stress level, and water intake of our participants.

Let’s start by looking at the SFN-dose and grams of sprouts fed to each cohort:

Table
sproutdata <- read_csv('../Data/Metadata/sprout_meta.csv') %>%
  modify_at('cohort', as.factor) 

knitr::kable(sproutdata)
cohort treatment grams_fed SFN_fed seed_batch
1 BU 71.00 135.8 1
1 BL 58.00 160.5 1
2 BU 64.00 151.6 1
2 BL 54.00 161.7 1
3 BU 40.00 81.3 1
3 BL 35.00 89.2 1
4 BU 30.40 100.0 1
4 BL 41.60 100.0 1
5 BU 47.10 100.0 1
5 BL 36.38 100.0 1
6 BU 29.03 100.0 1
6 BL 29.66 100.0 1
7 BU 28.90 100.0 1
7 BL 22.00 100.0 1
8 BU 26.50 100.0 2
8 BL 34.50 100.0 2
Graph
spd <- ggplot(sproutdata, aes(x = cohort , y = SFN_fed, fill = treatment)) + 
  geom_col(position = 'dodge2',
           aes(text = paste0('Cohort: ', cohort, '\n',
                             'Grams Sprouts Fed: ', grams_fed, 'g \n',
                             'SFN-Equivalents Fed: ', SFN_fed, ' µmol \n',
                             'Seed Batch: ', seed_batch))) +
  cowplot::theme_minimal_hgrid() +
  xlab('Cohort') + 
  ylab('SFN-Equivalents Fed (µMol)') +
  ggtitle('SFN-Equivalents Fed by Cohort')




plotly::ggplotly(spd, tooltip = 'text')

Plots

Now we will plot the total SFN-metabolites recovered as both a boxplot and dotplot. Note: the y-axis scales are not the same to allow us to see differences across metabolites more easily.

Boxplots
CoI_box <- ggplot(urine_clean, aes(x = cohort, y = SFN_Tot, fill = cohort)) +
  geom_boxplot() +
  facet_wrap(~time, scales = 'free_y') +
  cowplot::theme_minimal_grid() +
  xlab('Cohort') + 
  ylab('Total SFN Metabolite Excretion (µMol)') +
  ggtitle('Total SFN Metabolite Excretion') +
  theme(legend.position = 'none')

plotly::ggplotly(CoI_box)
Dotplots
CoI_ind <- ggplot(urine_clean, aes(x = cohort, y = SFN_Tot, color = subject_id, group = cohort)) + 
  geom_point(aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Total Excretion: ', SFN_Tot, ' µMol \n'))) +
  geom_point(stat = 'summary', fun = mean, fill = 'grey', color = 'black', alpha = 0.9, size = 3.6, shape = 18) +
  facet_wrap(~time, scales = 'free_y') +
  cowplot::theme_minimal_grid() +
  xlab('Cohort') + 
  ylab('Total SFN Metabolite Excretion (µMol)') +
  ggtitle('Total SFN Metabolite Excretion') +
  scale_color_discrete(name = 'Participant \n ID')


plotly::ggplotly(CoI_ind, tooltip = 'text')

Overall, it looks like most cohorts are quite similar, however, Cohorts 1 and 2 definitely have higher levels of excretion compared to the rest. For the 0h time point, it looks like participants BSS007 and BSS013 in Cohorts 1 and 2, respectively, made have made a mistake during their dietary restrictions too.

Let’s see how this plays out in our stats:

Stats

For ease of analysis, data will be stratified by time point and then analyzed across cohorts using an ANOVA. The interpretation of these results it that a significant p-value denotes a significant difference between at least 2 cohorts for that time point.

cohortstats <- urine_tidy %>%
  filter(metabolite == 'SFN_Tot') %>%
  group_by(time) %>%
  nest() %>%
  mutate(pval = map_dbl(data, function(x) anova(lm(uMol ~ cohort, data = x))$"Pr(>F)"[1])) %>%
  dplyr::select(-data) %>%
  rename('Time' = time, 'p-Value' = pval)

knitr::kable(cohortstats)
Time p-Value
0 0.3288060
3 0.2021590
6 0.0007957
24 0.0001501
48 0.0000024
72 0.0014641

It looks like at our later time points we get see significant differences. From looking at our plots, I can assume that this is being driven by cohorts 1 and 2 which were fed higher amounts of sulforaphane than the other cohorts. Let’s verify this by looking at the results of the linear model we made to do our ANOVA.

To interpret this output, we look at the Coefficients section of the output. (Intercept) is Cohort 1 and Estimate for this coefficient is the estimated mean of this cohort for that time point. For the other cohorts, Estimate represents the difference between its mean and the mean of Cohort 1. Pr(>|t|) is the p-value resulting from a t-test between each cohort and Cohort 1 (the (Intercept) term) - (ignore the p-value associated with the (Intercept) term). If a coefficient (Cohort) is found to be significant, it can be interpreted that the mean in total SFN metabolites excreted between that Cohort and Cohort 1 are significantly different.

cdata_list <- urine_tidy %>%
  filter(metabolite == 'SFN_Tot') %>%
  group_by(time) %>%
  nest() 

walk2(.x = cdata_list$data, 
      .y = list('0h', '3h', '6h', '24h', '48h', '72h'),
      function(x,y){
        print(y)
        print(summary(lm(uMol ~ cohort, data = x)))
      })
## [1] "0h"
## 
## Call:
## lm(formula = uMol ~ cohort, data = x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.37942 -0.00285  0.00000  0.00000  1.13825 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   0.3794     0.1225   3.098   0.0042 **
## cohort2      -0.3025     0.1732  -1.746   0.0910 . 
## cohort3      -0.3710     0.1732  -2.142   0.0404 * 
## cohort4      -0.3794     0.1581  -2.400   0.0228 * 
## cohort5      -0.3794     0.1871  -2.028   0.0515 . 
## cohort6      -0.3766     0.1581  -2.382   0.0238 * 
## cohort7      -0.3794     0.1500  -2.530   0.0169 * 
## cohort8      -0.3794     0.1871  -2.028   0.0515 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2449 on 30 degrees of freedom
## Multiple R-squared:  0.2199, Adjusted R-squared:  0.03786 
## F-statistic: 1.208 on 7 and 30 DF,  p-value: 0.3288
## 
## [1] "3h"
## 
## Call:
## lm(formula = uMol ~ cohort, data = x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.901  -9.074  -2.494   6.986  29.597 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   48.139      7.200   6.686 2.47e-07 ***
## cohort2       -9.384     10.183  -0.922   0.3644    
## cohort3      -13.991     10.183  -1.374   0.1800    
## cohort4      -18.596      9.660  -1.925   0.0641 .  
## cohort5      -26.478     10.999  -2.407   0.0227 *  
## cohort6      -24.012      9.296  -2.583   0.0151 *  
## cohort7      -21.234      8.819  -2.408   0.0226 *  
## cohort8      -21.374     10.999  -1.943   0.0617 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.4 on 29 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.2675, Adjusted R-squared:  0.09069 
## F-statistic: 1.513 on 7 and 29 DF,  p-value: 0.2022
## 
## [1] "6h"
## 
## Call:
## lm(formula = uMol ~ cohort, data = x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -31.232 -14.307  -3.131  10.101  56.011 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    75.04      10.08   7.443  2.7e-08 ***
## cohort2        12.92      14.26   0.906  0.37215    
## cohort3       -34.07      14.26  -2.390  0.02334 *  
## cohort4       -40.77      13.01  -3.133  0.00385 ** 
## cohort5       -26.52      15.40  -1.722  0.09536 .  
## cohort6       -45.71      13.01  -3.512  0.00143 ** 
## cohort7       -40.93      12.35  -3.315  0.00240 ** 
## cohort8       -30.47      15.40  -1.979  0.05710 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.16 on 30 degrees of freedom
## Multiple R-squared:  0.5374, Adjusted R-squared:  0.4294 
## F-statistic: 4.978 on 7 and 30 DF,  p-value: 0.0007957
## 
## [1] "24h"
## 
## Call:
## lm(formula = uMol ~ cohort, data = x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.500  -7.647  -0.751   9.526  36.503 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   83.641      8.695   9.619 1.12e-10 ***
## cohort2       -6.444     12.297  -0.524  0.60410    
## cohort3      -41.513     12.297  -3.376  0.00205 ** 
## cohort4      -47.142     11.225  -4.200  0.00022 ***
## cohort5      -47.374     13.282  -3.567  0.00124 ** 
## cohort6      -52.518     11.225  -4.679 5.77e-05 ***
## cohort7      -50.858     10.649  -4.776 4.39e-05 ***
## cohort8      -37.519     13.282  -2.825  0.00833 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.39 on 30 degrees of freedom
## Multiple R-squared:  0.5919, Adjusted R-squared:  0.4967 
## F-statistic: 6.217 on 7 and 30 DF,  p-value: 0.0001501
## 
## [1] "48h"
## 
## Call:
## lm(formula = uMol ~ cohort, data = x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1168 -1.8467  0.1342  1.7926  4.9284 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   14.894      1.393  10.689 9.48e-12 ***
## cohort2       -2.503      1.971  -1.270 0.213690    
## cohort3       -8.655      1.971  -4.392 0.000129 ***
## cohort4       -8.848      1.799  -4.919 2.93e-05 ***
## cohort5       -7.369      2.128  -3.462 0.001632 ** 
## cohort6      -11.503      1.799  -6.395 4.65e-07 ***
## cohort7      -10.978      1.707  -6.433 4.18e-07 ***
## cohort8       -9.771      2.128  -4.591 7.38e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.787 on 30 degrees of freedom
## Multiple R-squared:  0.6978, Adjusted R-squared:  0.6273 
## F-statistic: 9.896 on 7 and 30 DF,  p-value: 2.403e-06
## 
## [1] "72h"
## 
## Call:
## lm(formula = uMol ~ cohort, data = x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.08313 -0.43230 -0.04454  0.36891  2.85011 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.1035     0.4780   6.493 3.55e-07 ***
## cohort2      -0.8936     0.6760  -1.322 0.196200    
## cohort3      -2.2756     0.6760  -3.366 0.002101 ** 
## cohort4      -2.3524     0.6171  -3.812 0.000638 ***
## cohort5      -1.9318     0.7302  -2.646 0.012851 *  
## cohort6      -2.7888     0.6171  -4.519 9.02e-05 ***
## cohort7      -2.6567     0.5854  -4.538 8.56e-05 ***
## cohort8      -1.9976     0.7302  -2.736 0.010348 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.956 on 30 degrees of freedom
## Multiple R-squared:  0.5152, Adjusted R-squared:  0.4021 
## F-statistic: 4.554 on 7 and 30 DF,  p-value: 0.001464

As suspected, Cohorts 1 and 2 are higher than all other cohorts at all time points. Let’s remove Cohorts 1 and 2 from our data and then try again to see if it still comes out as significantly different.

cohortstats_2 <- urine_tidy %>%
  #Filter down to only total sulforaphane content
  filter(metabolite == 'SFN_Tot') %>%
  #Remove cohorts 1 and 2
  filter(!cohort %in% c(1,2)) %>%
  group_by(time) %>%
  nest() %>%
  mutate(pval = map_dbl(data, function(x) anova(lm(uMol ~ cohort, data = x))$"Pr(>F)"[1])) %>%
  dplyr::select(-data) %>%
  rename('Time' = time, 'p-Value' = pval)

knitr::kable(cohortstats_2)
Time p-Value
0 0.3210257
3 0.8174697
6 0.6086423
24 0.7824433
48 0.2127658
72 0.2650680

As expected, Cohorts 1 and 2 were driving the significance that we observed earlier.

How do we want to handle these cohorts for future analyses?

Question 3: What does metabolism look like?

SFN-NAC

Next we want to see the metabolism of SFN-NAC and SFN-NIT in our participants. To control for variable urine production between individuals, µM (as reported by Carmen) has been converted to µMol recovered in urine using urine weight as a proxy for volume. We will start by looking at the metabolism of SFN-NAC by plotting it by both subject, to see inter-individual variation, and by Cohort to see means across cohorts.

Hovering over any data point will give you more information about that point and the plot can be zoomed in on. The grey bars represent the mean at each timepoint.

By Participant
NACall_sub <- ggplot(urine_clean, aes(x = time, y = SFN_NAC, group = subject_id, color = subject_id)) +
  geom_bar(aes(group = time),stat = 'summary', fun = mean, fill = 'grey', color = 'black', alpha = 0.5) +
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Total Excretion: ', SFN_NAC, ' µMol \n'))) +
  geom_path(size = 0.5) + 
  theme_minimal_hgrid() +
  ylab('SFN-NAC (µmol)') +
  xlab('Time (Hours)') +
  ggtitle('SFN-NAC Excretion (µMol) by Participant') +
  scale_color_discrete(name = 'Participant \n ID')

plotly::ggplotly(NACall_sub, tooltip = 'text')
By Cohort
ser <- function(x){
  sd(x)/sqrt(length(x))
}  

summary_NAC <- urine_tidy %>%
  filter(metabolite == 'SFN_NAC') %>%
  group_by(time, cohort) %>%
  drop_na() %>%
  summarise(mean = round(mean(uMol, na.rm = TRUE),3),
            SE = round(ser(uMol), 3),
            n = n())

NACsum <- ggplot(summary_NAC, aes(x = time, y = mean, fill = cohort)) +
  geom_errorbar(aes(ymin = mean - SE, ymax = mean+SE,
                    text = paste('Standard Error: ±', SE, ' µMol')), width = 0.2) +
  geom_col(aes(text = paste0('Cohort: ', cohort, '\n',
                             'Time: ', time, '\n',
                             'Mean SFN-NAC: ', mean, ' µmol \n',
                             'n: ', n))) +
  theme_minimal_hgrid() +
  ylab('SFN-NAC (µmol)') +
  xlab('Time (Hours)') +
  facet_wrap(~cohort, ncol = 4) +
  ggtitle('SFN-NAC Excretion by Cohort') +
  theme(legend.position = 'none')

plotly::ggplotly(NACsum, tooltip = 'text')

SFN-NIT

Next we will look at SFN-NIT metabolism. To recap, in Lauren Atwell’s data we saw an increase in metabolism from 0h to 24 hours then for some participants there was a decline between 24 and 48 hours while other saw an increase.

By Pariticipant:
NITall_sub <- ggplot(urine_clean, aes(x = time, y = SFN_NIT, group = subject_id, color = subject_id)) +
  geom_bar(aes(group = time),stat = 'summary', fun = mean, fill = 'grey', color = 'black', alpha = 0.5) +
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Time Point: ', time, ' hours \n',
                               'SFN-NIT Excretion: ', SFN_NIT, ' µMol \n'))) +
  geom_path(size = 0.5) + 
  theme_minimal_hgrid() +
  ylab('SFN-NIT (µmol)') +
  xlab('Time (Hours)') +
  ggtitle('SFN-NIT Excretion (µMol) by Participant') +
  scale_color_discrete(name = 'Participant \n ID')

plotly::ggplotly(NITall_sub, tooltip = 'text')
By Cohort
summary_NIT <- urine_tidy %>%
  filter(metabolite == 'SFN_NIT') %>%
  group_by(time, cohort) %>%
  drop_na() %>%
  summarise(mean = round(mean(uMol, na.rm = TRUE),3),
            SE = round(ser(uMol), 3),
            n = n())

NITsum <- ggplot(summary_NIT, aes(x = time, y = mean, fill = cohort)) +
  geom_errorbar(aes(ymin = mean - SE, ymax = mean+SE,
                    text = paste('Standard Error: ±', SE, ' µMol')), width = 0.2) +
  geom_col(aes(text = paste0('Cohort: ', cohort, '\n',
                             'Time: ', time, '\n',
                             'Mean SFN-NIT: ', mean, ' µmol \n',
                             'n: ', n))) +
  theme_minimal_hgrid() +
  ylab('SFN-NIT (µmol)') +
  xlab('Time (Hours)') +
  facet_wrap(~cohort, ncol = 4) +
  theme(legend.position = 'none')

plotly::ggplotly(NITsum, tooltip = 'text')

Lastly we will look at the percent of the dose fed to help control for the unequal doses fed to Cohorts 1-3. Percent recovery is calculated by dividing the total SFN-metabolites recovered at each time point by the dose fed to that cohort.

Percent Dose Recovered:

By Pariticipant:
pct_data <- urine_clean %>%
  dplyr::select(subject_id, treatment, cohort, time, SFN_Tot) %>%
  left_join(sproutdata) %>%
  mutate(pct_rec = SFN_Tot/SFN_fed) 
  
pctI <- ggplot(pct_data, aes(x = time, y = pct_rec, group = subject_id, color = subject_id)) +
  geom_bar(aes(group = time),stat = 'summary', fun = mean, fill = 'grey', color = 'black', alpha = 0.5) +
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Time Point: ', time, ' hours \n',
                               '% Dose Recovered: ', round(pct_rec*100, 2), '% \n',
                               'SFN Dose Fed: ', SFN_fed, ' µmol'))) +
  geom_path(size = 0.5) + 
  theme_minimal_hgrid() +
  ylab('% Dose Recovered') +
  xlab('Time (Hours)') +
  ggtitle('% Dose Recovered by Participant') +
  scale_color_discrete(name = 'Participant \n ID') +
  scale_y_continuous(labels = scales::label_percent())

plotly::ggplotly(pctI, tooltip = 'text')
Percent Recovey by Cohort
pct_sum <- pct_data %>%
  group_by(time, cohort) %>%
  drop_na() %>%
  summarise(mean = round(mean(pct_rec, na.rm = TRUE),3),
            SE = round(ser(pct_rec), 3),
            n = n(),
            sf = mean(SFN_fed))

PCTsum <- ggplot(pct_sum, aes(x = time, y = mean, fill = cohort)) +
  geom_errorbar(aes(ymin = mean - SE, ymax = mean+SE,
                    text = paste('Standard Error: ±', round(SE*100, 2), ' %')), width = 0.2) +
  geom_col(aes(text = paste0('Cohort: ', cohort, '\n',
                             'Time: ', time, '\n',
                             'Mean % Recovered: ', round(mean*100, 2), '% \n',
                             'SFN Dose Fed: ', sf, 'µmol \n', 
                             'n: ', n))) +
  theme_minimal_hgrid() +
  ylab('% Dose Recovered') +
  xlab('Time (Hours)') +
  ggtitle('% Dose Recovered by Cohort') +
  facet_wrap(~cohort, ncol = 4) +
  scale_y_continuous(labels = scales::label_percent()) +
  theme(legend.position = 'none')

plotly::ggplotly(PCTsum, tooltip = 'text')

Interestingly, it looks like pattern in SFN-NIT that we saw in Lauren Atwell’s data isn’t observed here. One explanation could be the use of urine weight as a proxy for volume as opposed to true volume, as we used in Lauren Atwell’s data. The pattern in SFN-NAC excretion is also different from what we observed in Lauren’ Atwell’s data. The excretion of SFN-NIT between at 24 hours is fairly similar to what we saw in Lauren Atwell’s study, however. In her study, participants consumed 200 µmol of SFN-equivalents and ~23 µmol of SFN-NIT was recovered in urine at 24 hours, while in our study our particpants consumed 100 µmol of SFN-equivalents at ~12 µmol of SFN-NIT was recovered at 24 hours. In the Atwell study, there was also a 12 hour time point while we only looked at 24 hours which could also explain some of the differences we observed.

Lastly, let’s see at what time points SFN-NIT and SFN-NAC excretion was significantly different compared to 0hr. To evaluate this, we will use a nested mixed effects model. We will be using the model formula: uMol ~ time + (1|cohort/subject_id)

Interpretation of the results: Compared to 0h, does this time point significantly differ compare in µmol recovered?

library(lme4)
test <- urine_tidy %>%
  filter(metabolite == 'SFN_NIT')

#Fit our model:
lm_time <- urine_tidy %>%
  filter(metabolite != 'SFN_Tot') %>%
  group_by(metabolite) %>%
  nest() %>%
  #Run the model:
  mutate(model = map(data, function(x) lmer(uMol ~ time + (1|cohort/subject_id), data = x))) %>%
  #Run the contrasts across each time point
  mutate(pval_3h = map_dbl(model, function(x) lmerTest::contest(x, c(0,1,0,0,0,0))$'Pr(>F)')) %>%
  mutate(pval_6h = map_dbl(model, function(x) lmerTest::contest(x, c(0,0,1,0,0,0))$'Pr(>F)')) %>%
  mutate(pval_24h = map_dbl(model, function(x) lmerTest::contest(x, c(0,0,0,1,0,0))$'Pr(>F)')) %>%
  mutate(pval_48h = map_dbl(model, function(x) lmerTest::contest(x, c(0,0,0,0,1,0))$'Pr(>F)')) %>%
  mutate(pval_72h = map_dbl(model, function(x) lmerTest::contest(x, c(0,0,0,0,0,1))$'Pr(>F)')) %>%
  dplyr::select(-c(data, model)) %>%
  rename('Metabolite' = metabolite, '3h p-Value' = pval_3h, '6h p-Value' = pval_6h, '24h p-Value' = pval_24h,
         '48h p-Value' = pval_48h, '72h p-Value' = pval_72h)

#Round the data to make it more readable
lmt_rounded <- lm_time %>% 
  modify_if(is.numeric, round, 4)

knitr::kable(lmt_rounded)
Metabolite 3h p-Value 6h p-Value 24h p-Value 48h p-Value 72h p-Value
SFN 0.0000 0.0000 0.0000 0.9943 0.9943
SFN_CYS 0.0000 0.0000 0.0000 0.8433 0.9819
SFN_NAC 0.0000 0.0000 0.0000 0.2817 0.8287
SFN_CG 0.6383 0.2571 0.0476 0.9956 0.9956
SFN_GSH 0.2243 0.0015 1.0000 1.0000 1.0000
SFN_NIT 0.0010 0.0000 0.0000 0.0000 0.4688

Question 4: Which participants were the highest and lowest SFN-NIT and SFN-NAC excreters?

To address this problem we can take three approaches:

  1. Find participants who excreted the highest cumulative amount SFN-NIT and SFN-NAC across all times points

  2. Find which participants excreted the maximum amount of SFN-NIT and SFN-NAC at a single time point.

  3. Find which participants had the greatest percent recovery using the two methods above

Cumulative Excretion

We will use our first strategy to look at cumulative metabolite production of NAC and NIT across all time points.

SFN-NAC
cumsum_NAC <- urine_tidy %>%
  filter(metabolite == 'SFN_NAC') %>%
  drop_na() %>%
  group_by(subject_id, cohort) %>%
  nest() %>%
  mutate(data = map_dbl(data, function(x) sum(x$uMol))) %>%
  ungroup()

fillvec <- rep('', length(cumsum_NAC$data))
fillvec[which(cumsum_NAC$data %in% head(cumsum_NAC$data[order(cumsum_NAC$data, decreasing = TRUE)], 10))] <- 'Highest'
fillvec[which(cumsum_NAC$data %in% tail(cumsum_NAC$data[order(cumsum_NAC$data, decreasing = TRUE)], 10))] <- 'Lowest'
cumsum_NAC$maxmin <- fillvec

csoutNAC <- cumsum_NAC %>% 
  dplyr::select(subject_id, cohort, data, maxmin) %>%
  filter(maxmin != '') %>%
  arrange(subject_id) %>%
  rename('Participant' = subject_id, 'Cohort' = cohort, 'Cumulative µMol Excreted' = data, 'Highest or Lowest Producer' = maxmin)

knitr::kable(csoutNAC, caption = 'Highest and Lowest Cumulative SFN-NAC Excreters')
Highest and Lowest Cumulative SFN-NAC Excreters
Participant Cohort Cumulative µMol Excreted Highest or Lowest Producer
BSS003 1 150.42256 Highest
BSS005 1 130.96709 Highest
BSS006 1 113.98006 Highest
BSS007 1 137.54865 Highest
BSS009 2 156.76076 Highest
BSS013 2 107.79796 Highest
BSS015 2 107.59466 Highest
BSS017 2 158.85693 Highest
BSS029 3 110.93733 Highest
BSS032 4 42.19422 Lowest
BSS035 4 138.58845 Highest
BSS039 4 18.99498 Lowest
BSS040 4 30.46430 Lowest
BSS047 5 59.77472 Lowest
BSS054 6 51.31053 Lowest
BSS056 6 27.05526 Lowest
BSS058 6 52.54184 Lowest
BSS068 7 28.00599 Lowest
BSS070 7 59.71429 Lowest
BSS072 7 37.51804 Lowest
SFN-NIT
cumsum_NIT <- urine_tidy %>%
  filter(metabolite == 'SFN_NIT') %>%
  drop_na() %>%
  group_by(subject_id, cohort) %>%
  nest() %>%
  mutate(data = map_dbl(data, function(x) sum(x$uMol))) %>%
  ungroup()

fillvec <- rep('', length(cumsum_NIT$data))
fillvec[which(cumsum_NIT$data %in% head(cumsum_NIT$data[order(cumsum_NIT$data, decreasing = TRUE)], 10))] <- 'Highest'
fillvec[which(cumsum_NIT$data %in% tail(cumsum_NIT$data[order(cumsum_NIT$data, decreasing = TRUE)], 10))] <- 'Lowest'
cumsum_NIT$maxmin <- fillvec
csoutNIT <- cumsum_NIT %>% 
  dplyr::select(subject_id, cohort, data, maxmin) %>%
  filter(maxmin != '') %>%
  arrange(subject_id) %>%
  rename('Participant' = subject_id, 'Cohort' = cohort, 'Cumulative µMol Excreted' = data, 'Highest or Lowest Producer' = maxmin)

knitr::kable(csoutNIT, caption = 'Highest and Lowest Cumulative SFN-NIT Excreters')
Highest and Lowest Cumulative SFN-NIT Excreters
Participant Cohort Cumulative µMol Excreted Highest or Lowest Producer
BSS005 1 75.526580 Highest
BSS006 1 56.647162 Highest
BSS007 1 44.036687 Highest
BSS009 2 68.164386 Highest
BSS015 2 48.246656 Highest
BSS017 2 37.758331 Highest
BSS030 4 35.722998 Highest
BSS032 4 11.309145 Lowest
BSS035 4 35.311257 Highest
BSS039 4 5.412319 Lowest
BSS040 4 7.045884 Lowest
BSS046 6 13.323724 Lowest
BSS047 5 39.095553 Highest
BSS052 6 5.923686 Lowest
BSS054 6 7.821824 Lowest
BSS058 6 9.887403 Lowest
BSS064 7 10.159723 Lowest
BSS068 7 11.023327 Lowest
BSS080 8 35.809481 Highest
BSS083 8 12.356152 Lowest
Total Metabolites % Recovered
cumsum_pct <- urine_tidy %>%
  drop_na() %>%
  group_by(subject_id, cohort) %>%
  nest() %>%
  mutate(data = map_dbl(data, function(x) sum(x$pct_rec))) %>%
  ungroup()

cumsum_pct <- urine_tidy %>%
  filter(metabolite == 'SFN_Tot') %>%
  drop_na() %>%
  group_by(subject_id, cohort, treatment) %>%
  nest() %>%
  mutate(data = map_dbl(data, function(x) sum(x$uMol))) %>%
  ungroup() %>%
  left_join(sproutdata) %>%
  mutate(total_rec = (data/SFN_fed)*100)

fillvec <- rep('', length(cumsum_pct$total_rec))
fillvec[which(cumsum_pct$total_rec %in% head(cumsum_pct$total_rec[order(cumsum_pct$total_rec, decreasing = TRUE)], 10))] <- 'Highest'
fillvec[which(cumsum_pct$total_rec %in% tail(cumsum_pct$total_rec[order(cumsum_pct$total_rec, decreasing = TRUE)], 10))] <- 'Lowest'
cumsum_pct$maxmin <- fillvec
csoutpct <- cumsum_pct %>% 
  dplyr::select(subject_id, cohort, total_rec, maxmin) %>%
  filter(maxmin != '') %>%
  arrange(subject_id) %>%
  rename('Participant' = subject_id, 'Cohort' = cohort, 'Cumulative % Recovered' = total_rec, 'Highest or Lowest Producer' = maxmin)

knitr::kable(csoutpct, caption = 'Highest and Lowest Cumulative Percent Recovery')
Highest and Lowest Cumulative Percent Recovery
Participant Cohort Cumulative % Recovered Highest or Lowest Producer
BSS003 1 136.39875 Highest
BSS005 1 178.97876 Highest
BSS006 1 154.64583 Highest
BSS007 1 142.54768 Highest
BSS009 2 166.78647 Highest
BSS017 2 161.27763 Highest
BSS023 3 145.56821 Highest
BSS029 3 190.87845 Highest
BSS030 4 151.41691 Highest
BSS032 4 62.94983 Lowest
BSS035 4 203.58454 Highest
BSS039 4 28.06143 Lowest
BSS040 4 44.51580 Lowest
BSS054 6 71.99378 Lowest
BSS056 6 59.42894 Lowest
BSS058 6 79.72320 Lowest
BSS064 7 101.60157 Lowest
BSS068 7 44.83770 Lowest
BSS070 7 100.49896 Lowest
BSS072 7 66.78433 Lowest

For SFN-NAC, our top excreters are BSS017, BSS009, BSS003, BSS035, and BSS007.

For SFN-NIT, our top excreters are BSS005, BSS009, BSS006, BSS015, and BSS007.

Looking at the cumulative recovery of dose, the top excreters are BSS0035, BSS029, BSS005, BSS009, and BSS017.

There is quite a bit of overlap between our top NAC and NIT excreters and most of top excreters unsurprisingly come from Cohorts 1 and 2. BSS035, a top NAC excreter, is an interesting exception to this pattern.

When looking at cumulative recovery, we are seeing well over 100% of the fed dose being recovered. This is rather interesting but could be due to the conversion of erucin to SFN.

Let’s plot our highest and lowest excreters from the cumulative method:

High/Low NAC - Recovery
cs_final_NAC <- cumsum_NAC %>%
  filter(maxmin != '') %>%
  select(subject_id, maxmin) %>%
  left_join(urine_tidy) %>%
  filter(metabolite %in% c('SFN_NAC', 'SFN_NIT')) 
  
  
cs_plot_r <- ggplot(cs_final_NAC, aes(x = time, y = uMol, color = subject_id,  group = subject_id)) +
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Group: ', maxmin, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Recovered: ', uMol, ' µMol \n'))) +
  geom_path(size = 0.5) +
  facet_wrap(~metabolite, ncol = 4) +
  theme_minimal_hgrid() +
  ylab('Metabolites Recovered (µmol)') +
  xlab('Time (Hours)') +
  ggtitle('High/Low SFN-NAC Excreters - Recovery') +
  scale_color_discrete(name = 'Participant \n ID')

plotly::ggplotly(cs_plot_r, tooltip = 'text')
High/Low NAC - Cumulative Excretion
cssum <- cs_final_NAC %>%
  group_by(subject_id, metabolite, maxmin) %>%
  mutate(cumsum = cumsum(uMol)) 


csplot_t <- ggplot(cssum, aes(x = time, y = cumsum, color = subject_id, group = subject_id)) + 
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Group: ', maxmin, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Cumulative Excretion: ', cumsum, ' µMol \n'))) +
  geom_path(aes(group = subject_id), size =0.5) +
  facet_wrap(~metabolite, ncol = 4) +
  theme_minimal_hgrid() +
  ylab('Cumulative Recovery (µmol)') +
  xlab('Time (Hours)') +
  ggtitle('High/Low SFN-NAC Excreters - Cumulative') +
  scale_color_discrete(name = 'Participant \n ID')

plotly::ggplotly(csplot_t, tooltip = 'text')
High/Low NIT - Recovery
cs_final_NIT <- cumsum_NIT %>%
  filter(maxmin != '') %>%
  select(subject_id, maxmin) %>%
  left_join(urine_tidy) %>%
  filter(metabolite %in% c('SFN_NAC', 'SFN_NIT')) 
  
  
cs_plot_r_nit <- ggplot(cs_final_NIT, aes(x = time, y = uMol, color = subject_id,  group = subject_id)) +
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Group: ', maxmin, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Recovered: ', uMol, ' µMol \n'))) +
  geom_path(size = 0.5) +
  facet_wrap(~metabolite, ncol = 4) +
  theme_minimal_hgrid() +
  ylab('Metabolites Recovered (µmol)') +
  xlab('Time (Hours)') +
  ggtitle('High/Low SFN-NIT Excreters - Recovery') +
  scale_color_discrete(name = 'Participant \n ID')

plotly::ggplotly(cs_plot_r_nit, tooltip = 'text')
High/Low NIT - Cumulative Excretion
cssum_NIT <- cs_final_NIT %>%
  group_by(subject_id, metabolite, maxmin) %>%
  mutate(cumsum = cumsum(uMol)) 


csplot_t_nit <- ggplot(cssum_NIT, aes(x = time, y = cumsum, color = subject_id, group = subject_id)) + 
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Group: ', maxmin, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Cumulative Excretion: ', cumsum, ' µMol \n'))) +
  geom_path(aes(group = subject_id), size = 0.5) +
  facet_wrap(~metabolite, ncol = 4) +
  theme_minimal_hgrid() +
  ylab('Cumulative Recovery (µmol)') +
  xlab('Time (Hours)') +
  ggtitle('High/Low SFN-NIT Excreters - Cumulative') +
  scale_color_discrete(name = 'Participant \n ID')

plotly::ggplotly(csplot_t_nit, tooltip = 'text')

Maximum Recovery

Next we will search across each individual and find their maximum NAC and NIT excretion then search across all individuals to find the highest excretion.

SFN-NAC
maxminNAC <- urine_tidy %>%
  filter(metabolite == 'SFN_NAC') %>%
  drop_na() %>%
  #Group by subject ID and break down into individual dataframes by participant
  group_by(subject_id) %>%
  nest() %>%
  #Pull out the time point with max recovery for each metabolite and each individual
  mutate(data = map(data, function(x) x[which(x$uMol == max(x$uMol)),])) %>%
  #Put the dataframe back together
  unnest(cols = data) %>%
  arrange(uMol) %>%
  ungroup()

fillvec <- rep('', length(maxminNAC$uMol))
fillvec[which(maxminNAC$uMol %in% head(maxminNAC$uMol[order(maxminNAC$uMol, decreasing = TRUE)], 10))] <- 'Highest'
fillvec[which(maxminNAC$uMol %in% tail(maxminNAC$uMol[order(maxminNAC$uMol, decreasing = TRUE)], 10))] <- 'Lowest'
maxminNAC$maxmin <- fillvec
outtableNAC <- maxminNAC %>% 
  dplyr::select(subject_id, cohort, time, uMol, maxmin) %>%
  filter(maxmin != '') %>%
  arrange(subject_id) %>%
  rename('Participant' = subject_id, 'Cohort' = cohort, 'Time Recovered' = time, 'Receovered µMol' = uMol, 'Highest or Lowest Producer' = maxmin)

knitr::kable(outtableNAC, caption = 'Highest and Lowest SFN-NAC Producers')
Highest and Lowest SFN-NAC Producers
Participant Cohort Time Recovered Receovered µMol Highest or Lowest Producer
BSS003 1 3 55.76902 Highest
BSS005 1 6 51.93529 Highest
BSS006 1 24 52.24617 Highest
BSS007 1 6 54.78522 Highest
BSS009 2 6 86.60635 Highest
BSS013 2 6 49.20966 Highest
BSS017 2 24 57.61610 Highest
BSS021 3 6 21.80216 Lowest
BSS024 3 3 24.78421 Lowest
BSS029 3 6 45.97949 Highest
BSS032 4 6 19.54395 Lowest
BSS035 4 24 51.10126 Highest
BSS039 4 3 10.01200 Lowest
BSS040 4 24 17.57776 Lowest
BSS054 6 24 20.02766 Lowest
BSS056 6 3 11.80669 Lowest
BSS068 7 6 11.91840 Lowest
BSS070 7 3 27.15104 Lowest
BSS072 7 24 13.05457 Lowest
BSS080 8 6 42.61217 Highest
SFN-NIT
maxminNIT <- urine_tidy %>%
  filter(metabolite == 'SFN_NIT') %>%
  drop_na() %>%
  #Group by subject ID and break down into individual dataframes by participant
  group_by(subject_id) %>%
  nest() %>%
  #Pull out the time point with max recovery for each metabolite and each individual
  mutate(data = map(data, function(x) x[which(x$uMol == max(x$uMol)),])) %>%
  #Put the dataframe back together
  unnest(cols = data) %>%
  ungroup()

fillvec <- rep('', length(maxminNIT$uMol))
fillvec[which(maxminNIT$uMol %in% head(maxminNIT$uMol[order(maxminNIT$uMol, decreasing = TRUE)], 10))] <- 'Highest'
fillvec[which(maxminNIT$uMol %in% tail(maxminNIT$uMol[order(maxminNIT$uMol, decreasing = TRUE)], 10))] <- 'Lowest'
maxminNIT$maxmin <- fillvec
outtableNIT <- maxminNIT%>% 
  dplyr::select(subject_id, cohort, time, uMol, maxmin) %>%
  filter(maxmin != '') %>%
  arrange(subject_id) %>%
  rename('Participant' = subject_id, 'Cohort' = cohort, 'Time Recovered' = time, 'Receovered µMol' = uMol, 'Highest or Lowest Producer' = maxmin)

knitr::kable(outtableNIT, caption = 'Highest and Lowest SFN-NIT Producers')
Highest and Lowest SFN-NIT Producers
Participant Cohort Time Recovered Receovered µMol Highest or Lowest Producer
BSS005 1 24 40.676706 Highest
BSS006 1 24 31.925270 Highest
BSS007 1 24 19.252750 Highest
BSS009 2 24 28.322472 Highest
BSS015 2 24 23.726976 Highest
BSS017 2 24 19.636344 Highest
BSS030 4 24 16.494438 Highest
BSS032 4 24 5.097592 Lowest
BSS035 4 24 14.961527 Highest
BSS036 4 24 7.180311 Lowest
BSS039 4 3 1.778377 Lowest
BSS040 4 24 3.905985 Lowest
BSS047 5 24 17.056744 Highest
BSS052 6 24 2.766656 Lowest
BSS054 6 24 3.817469 Lowest
BSS058 6 24 6.197518 Lowest
BSS064 7 24 4.703800 Lowest
BSS068 7 24 6.546103 Lowest
BSS080 8 24 18.767448 Highest
BSS083 8 24 6.203923 Lowest
Total SFN Metabolites % Recovered
maxminpct <-  pct_data%>%
  drop_na() %>%
  #Group by subject ID and break down into individual dataframes by participant
  group_by(subject_id) %>%
  nest() %>%
  #Pull out the time point with max recovery for each metabolite and each individual
  mutate(data = map(data, function(x) x[which(x$pct_rec == max(x$pct_rec)),])) %>%
  #Put the dataframe back together
  unnest(cols = data) %>%
  ungroup()

fillvec <- rep('', length(maxminpct$pct_rec))
fillvec[which(maxminpct$pct_rec %in% head(maxminpct$pct_rec[order(maxminpct$pct_rec, decreasing = TRUE)], 10))] <- 'Highest'
fillvec[which(maxminpct$pct_rec %in% tail(maxminpct$pct_rec[order(maxminpct$pct_rec, decreasing = TRUE)], 10))] <- 'Lowest'
maxminpct$maxmin <- fillvec
outtablepct <- maxminpct %>% 
  dplyr::select(subject_id, cohort, time, pct_rec, maxmin) %>%
  filter(maxmin != '') %>%
  arrange(subject_id) %>%
  mutate(pct_rec = round(pct_rec*100, 2)) %>%
  rename('Participant' = subject_id, 'Cohort' = cohort, 'Time Recovered' = time, 'Recovered %' = pct_rec, 'Highest or Lowest Producer' = maxmin)

knitr::kable(outtablepct, caption = 'Highest and Lowest Percent Recovery')
Highest and Lowest Percent Recovery
Participant Cohort Time Recovered Recovered % Highest or Lowest Producer
BSS005 1 24 74.71 Highest
BSS006 1 24 70.62 Highest
BSS009 2 6 89.03 Highest
BSS017 2 24 59.70 Highest
BSS029 3 6 77.43 Highest
BSS030 4 24 66.88 Highest
BSS032 4 6 26.40 Lowest
BSS035 4 24 73.00 Highest
BSS039 4 3 14.26 Lowest
BSS040 4 24 24.79 Lowest
BSS054 6 24 27.72 Lowest
BSS056 6 3 20.94 Lowest
BSS058 6 24 39.51 Lowest
BSS063 7 6 55.96 Highest
BSS065 7 24 62.03 Highest
BSS066 7 6 36.56 Lowest
BSS068 7 6 17.21 Lowest
BSS070 7 3 39.93 Lowest
BSS072 7 24 24.21 Lowest
BSS080 8 6 61.40 Highest

Once again, we see quite a bit of overlap between our high NIT and NAC producers.

The highest NAC producers are BSS009, BSS017, BSS003, BSS007, and BSS006.

Similarly the highest NIT producers are BSS005, BSS006, BSS009, BSS015, and BSS017.

The highest percent recovery at a single timepoint was in BSS009, BSS029, BSS005, BSS006, and BSS035.

We can now plot our results to see how they look:

High/Low NAC - Recovery
mx_final <- maxminNAC %>%
  filter(maxmin != '') %>%
  select(subject_id, maxmin) %>%
  left_join(urine_tidy) %>%
  filter(metabolite %in% c('SFN_NAC', 'SFN_NIT'))
  
  
mx_plot_r <- ggplot(mx_final, aes(x = time, y = uMol, color = subject_id,  group = subject_id)) +
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Group: ', maxmin, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Recovered: ', round(uMol,3), ' µMol \n'))) +
  geom_path(size = 0.5) +
  facet_wrap(~metabolite, ncol = 4) +
  theme_minimal_hgrid() +
  ylab('Metabolites Recovered (µmol)') +
  xlab('Time (Hours)') +
  ggtitle('High/Low SFN-NAC Excreters - Recovery') +
  scale_color_discrete(name = 'Participant \n ID')

plotly::ggplotly(mx_plot_r, tooltip = 'text')
High/Low NAC - Cumulative Excretion
mxsum <- mx_final %>%
  group_by(subject_id, metabolite, maxmin) %>%
  mutate(cumsum = cumsum(uMol)) 


mxplot_t <- ggplot(mxsum, aes(x = time, y = cumsum, color = subject_id, group = subject_id)) + 
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Group: ', maxmin, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Recovered: ', round(cumsum,3), ' µMol \n'))) +
  geom_path(size = 0.5) +
  facet_wrap(~metabolite, ncol = 4) +
  theme_minimal_hgrid() +
  ylab('Cumulative Recovery (µmol)') +
  xlab('Time (Hours)') +
  ggtitle('High/Low SFN-NIT Excreters - Cumulative') +
  scale_color_discrete(name = 'Participant \n ID')

plotly::ggplotly(mxplot_t, tooltip = 'text')
High/Low NIT - Recovery
mxfinal_NIT <- maxminNIT %>%
  filter(maxmin != '') %>%
  select(subject_id, maxmin) %>%
  left_join(urine_tidy) %>%
  filter(metabolite %in% c('SFN_NAC', 'SFN_NIT')) 
  
  
mx_plot_r_nit <- ggplot(mxfinal_NIT, aes(x = time, y = uMol, color = subject_id,  group = subject_id)) +
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Group: ', maxmin, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Recovered: ', round(uMol,3), ' µMol \n'))) +
  geom_path(size = 0.5) +
  facet_wrap(~metabolite, ncol = 4) +
  theme_minimal_hgrid() +
  ylab('Metabolites Recovered (µmol)') +
  xlab('Time (Hours)') +
  ggtitle('High/Low SFN-NIT Excreters - Recovery') +
  scale_color_discrete(name = 'Participant \n ID')


plotly::ggplotly(mx_plot_r_nit, tooltip = 'text')
High/Low NIT - Cumulative Excretion
mxsum_NIT <- mxfinal_NIT %>%
  group_by(subject_id, metabolite, maxmin) %>%
  mutate(cumsum = cumsum(uMol)) 


mxplot_t_nit <- ggplot(mxsum_NIT, aes(x = time, y = cumsum, color = subject_id, group = subject_id)) + 
  geom_point(size = 2, aes(text = paste0('Participant ID: ', subject_id, '\n',
                               'Cohort: ', cohort, '\n',
                               'Group: ', maxmin, '\n',
                               'Time Point: ', time, ' hours \n',
                               'Cumulative Excretion: ', round(cumsum, 3), ' µMol \n'))) +
  geom_path(aes(group = subject_id), size = 0.5) +
  facet_wrap(~metabolite, ncol = 4) +
  theme_minimal_hgrid() +
  ylab('Cumulative Recovery (µmol)') +
  xlab('Time (Hours)') +
  ggtitle('High/Low SFN-NIT Excreters - Cumulative') +
  scale_color_discrete(name = 'Participant \n ID')

plotly::ggplotly(mxplot_t_nit, tooltip = 'text')

Which individuals do want to investigate further for our metabolomics experiment? Which method do we want to use to select our participants? Cumulative excretion or maximum recovery? Do we want to focus on the NIT or NAC list?

Question 5: What is the relationship between SFN consumed and excreted?

Now let’s see if there is a correlation between total SFN excretion and the amount eaten. We will check for each metabolite:

sproutcor <- urine_tidy %>%
  filter(metabolite %in% c('SFN_Tot', 'SFN_NIT', 'SFN_NAC')) %>%
  drop_na() %>%
  group_by(subject_id, cohort, metabolite, treatment) %>%
  nest() %>%
  mutate(data = map_dbl(data, function(x) sum(x$uMol))) %>%
  rename('uMol' = data) %>%
  ungroup() %>%
  left_join(sproutdata) %>%
  pivot_longer(cols = c('SFN_fed', 'grams_fed'), names_to = 'measure') %>%
  group_by(measure, metabolite) %>%
  nest() %>%
  mutate(rval = map_dbl(data, function(x) cor(x$value, x$uMol))) %>%
  unnest(col = 'data') %>%
  pivot_wider(names_from = 'measure', values_from = 'value') %>%
  pivot_wider(names_from = 'metabolite', values_from = 'uMol')

SFN-NAC
p1_NAC <- ggplot(sproutcor, aes(x = SFN_fed, y = SFN_NAC, color = cohort, group = 1)) + 
  geom_point(aes(text = paste0('Participant: ', subject_id, '\n',
                               'Treatment: ', treatment, '\n',
                               'Cohort: ', cohort, '\n',
                               'Total Excreted: ', round(SFN_NAC, 3), ' µMol \n',
                               'SFN-Equivalents Fed: ', SFN_fed, ' µMol \n')))+
  geom_smooth(method = 'lm', se = F, color = 'black',
              aes(text = paste0("Pearon's r: ", round(rval, 3)))) +
  cowplot::theme_minimal_grid() +
  xlab('µMol SFN Fed') +
  ylab('Total SFN-NAC Recovered (µMol)') +
  theme(legend.position = 'none')
p1_NAC <- plotly::ggplotly(p1_NAC, tooltip = 'text')

p2_NAC <- ggplot(sproutcor, aes(x = grams_fed, y = SFN_NAC, color = cohort, group = 1)) + 
  geom_point(aes(text = paste0('Participant: ', subject_id, '\n',
                               'Treatment: ', treatment, '\n',
                               'Cohort: ', cohort, '\n',
                               'Total Excreted: ', round(SFN_NAC, 3), ' µMol \n',
                               'Grams Sprouts Fed: ', grams_fed, ' g \n')))+
  geom_smooth(method = 'lm', se = F, color = 'black',
              aes(text = paste0("Pearon's r: ", round(rval, 3))))  +
  cowplot::theme_minimal_grid() +
  xlab('Grams of Sprouts Fed') +
  ylab('') +
  theme(legend.position = 'none')
p2_NAC <- plotly::ggplotly(p2_NAC, tooltip = 'text')


plotly::subplot(p1_NAC, p2_NAC, titleY = TRUE, titleX = TRUE, shareY = T) 
SFN-NIT
p1_NIT <- ggplot(sproutcor, aes(x = SFN_fed, y = SFN_NIT, color = cohort, group = 1)) + 
  geom_point(aes(text = paste0('Participant: ', subject_id, '\n',
                               'Treatment: ', treatment, '\n',
                               'Cohort: ', cohort, '\n',
                               'Total Excreted: ', round(SFN_NIT, 3), ' µMol \n',
                               'SFN-Equivalents Fed: ', SFN_fed, ' µMol \n')))+
  geom_smooth(method = 'lm', se = F, color = 'black',
              aes(text = paste0("Pearon's r: ", round(rval, 3)))) +
  cowplot::theme_minimal_grid() +
  xlab('µMol SFN Fed') +
  ylab('Total SFN-NIT Recovered (µMol)') +
  theme(legend.position = 'none')
p1_NIT <- plotly::ggplotly(p1_NIT, tooltip = 'text')

p2_NIT <- ggplot(sproutcor, aes(x = grams_fed, y = SFN_NIT, color = cohort, group = 1)) + 
  geom_point(aes(text = paste0('Participant: ', subject_id, '\n',
                               'Treatment: ', treatment, '\n',
                               'Cohort: ', cohort, '\n',
                               'Total Excreted: ', round(SFN_NIT, 3), ' µMol \n',
                               'Grams Sprouts Fed: ', grams_fed, ' g \n')))+
  geom_smooth(method = 'lm', se = F, color = 'black',
              aes(text = paste0("Pearon's r: ", round(rval, 3))))  +
  cowplot::theme_minimal_grid() +
  xlab('Grams of Sprouts Fed') +
  ylab('') +
  theme(legend.position = 'none')
p2_NIT <- plotly::ggplotly(p2_NIT, tooltip = 'text')


plotly::subplot(p1_NIT, p2_NIT, titleY = TRUE, titleX = TRUE, shareY = T) 
Total SFN Metabolites
p1_Tot <- ggplot(sproutcor, aes(x = SFN_fed, y = SFN_Tot, color = cohort, group = 1)) + 
  geom_point(aes(text = paste0('Participant: ', subject_id, '\n',
                               'Treatment: ', treatment, '\n',
                               'Cohort: ', cohort, '\n',
                               'Total Excreted: ', round(SFN_Tot, 3), ' µMol \n',
                               'SFN-Equivalents Fed: ', SFN_fed, ' µMol \n')))+
  geom_smooth(method = 'lm', se = F, color = 'black',
              aes(text = paste0("Pearon's r: ", round(rval, 3)))) +
  cowplot::theme_minimal_grid() +
  xlab('µMol SFN Fed') +
  ylab('Total SFN Metabolites Recovered (µMol)') +
  theme(legend.position = 'none')
p1_Tot <- plotly::ggplotly(p1_Tot, tooltip = 'text')

p2_Tot <- ggplot(sproutcor, aes(x = grams_fed, y = SFN_Tot, color = cohort, group = 1)) + 
  geom_point(aes(text = paste0('Participant: ', subject_id, '\n',
                               'Treatment: ', treatment, '\n',
                               'Cohort: ', cohort, '\n',
                               'Total Excreted: ', round(SFN_Tot, 3), ' µMol \n',
                               'Grams Sprouts Fed: ', grams_fed, ' g \n')))+
  geom_smooth(method = 'lm', se = F, color = 'black',
              aes(text = paste0("Pearon's r: ", round(rval, 3))))  +
  cowplot::theme_minimal_grid() +
  xlab('Grams of Sprouts Fed') +
  ylab('') +
  theme(legend.position = 'none')
p2_Tot <- plotly::ggplotly(p2_Tot, tooltip = 'text')


plotly::subplot(p1_Tot, p2_Tot, titleY = TRUE, titleX = TRUE, shareY = T) 

There is a pretty clear relationship between the grams of sprouts fed and SFN metabolites recovered. Cohorts 4-8 are definitely interesting and show really high variation which is interesting. Let’s focus in on only those participants briefly and look at their total SFN metabolites:

cfocus <- sproutcor %>%
  filter(cohort %in% 4:8)

p1_f <- ggplot(cfocus, aes(x = SFN_fed, y = SFN_Tot, color = cohort, group = 1)) + 
  geom_point(aes(text = paste0('Participant: ', subject_id, '\n',
                               'Treatment: ', treatment, '\n',
                               'Cohort: ', cohort, '\n',
                               'Total Excreted: ', round(SFN_Tot, 3), ' µMol \n',
                               'SFN-Equivalents Fed: ', SFN_fed, ' µMol \n')))+
  geom_smooth(method = 'lm', se = F, color = 'black',
              aes(text = paste0("Pearon's r: ", round(rval, 3)))) +
  cowplot::theme_minimal_grid() +
  xlab('µMol SFN Fed') +
  ylab('Total SFN Metabolites Recovered (µMol)') +
  theme(legend.position = 'none')
p1_f <- plotly::ggplotly(p1_f, tooltip = 'text')

p2_f <- ggplot(cfocus, aes(x = grams_fed, y = SFN_Tot, color = cohort, group = 1)) + 
  geom_point(aes(text = paste0('Participant: ', subject_id, '\n',
                               'Treatment: ', treatment, '\n',
                               'Cohort: ', cohort, '\n',
                               'Total Excreted: ', round(SFN_Tot, 3), ' µMol \n',
                               'Grams Sprouts Fed: ', grams_fed, ' g \n')))+
  geom_smooth(method = 'lm', se = F, color = 'black',
              aes(text = paste0("Pearon's r: ", round(rval, 3))))  +
  cowplot::theme_minimal_grid() +
  xlab('Grams of Sprouts Fed') +
  ylab('') +
  theme(legend.position = 'none')
p2_f <- plotly::ggplotly(p2_f, tooltip = 'text')


plotly::subplot(p1_f, p2_f, titleY = TRUE, titleX = TRUE, shareY = T) 

It does look like there is a relationship between the amount of sprouts consumed and the recovery of SFN-metabolites. Hopefully by using the hierarchical model it will help to control for this - definitely need to have more conversations with Duo about this issue. This is an interesting finding tho.